Machine Learning with Annotator Rationales to Reduce Annotation Cost
نویسندگان
چکیده
We review two novel methods for text categorization, based on a new framework that utilizes richer annotations that we call annotator rationales. A human annotator provides hints to a machine learner by highlighting contextual “rationales” in support of each of his or her annotations. We have collected such rationales, in the form of substrings, for an existing document sentiment classification dataset [1]. We have developed two methods, one discriminative [2] and one generative [3], that use these rationales during training to obtain significant accuracy improvements over two strong baselines. Our generative model in particular could be adapted to help learn other kinds of probabilistic classifiers for quite different tasks. Based on a small study of annotation speed, we posit that for some tasks, providing rationales can be a more fruitful use of an annotator’s time than annotating more examples.
منابع مشابه
Using "Annotator Rationales" to Improve Machine Learning for Text Categorization
We propose a new framework for supervised machine learning. Our goal is to learn from smaller amounts of supervised training data, by collecting a richer kind of training data: annotations with “rationales.” When annotating an example, the human teacher will also highlight evidence supporting this annotation—thereby teaching the machine learner why the example belongs to the category. We provid...
متن کاملCrowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks
Human annotators are critical for creating the necessary datasets to train statistical learners, but annotation cost and limited access to qualified annotators forms a data bottleneck. In recent years, researchers have investigated overcoming this obstacle using crowdsourcing, which is the delegation of a particular task to a large group of untrained individuals rather than a select trained few...
متن کاملHow well does active learning <i>actually work? Time-based</i> evaluation of cost-reduction strategies for language documentation.
Machine involvement has the potential to speed up language documentation. We assess this potential with timed annotation experiments that consider annotator expertise, example selection methods, and suggestions from a machine classifier. We find that better example selection and label suggestions improve efficiency, but effectiveness depends strongly on annotator expertise. Our expert performed...
متن کاملModeling Annotators: A Generative Approach to Learning from Annotator Rationales
A human annotator can provide hints to a machine learner by highlighting contextual “rationales” for each of his or her annotations (Zaidan et al., 2007). How can one exploit this side information to better learn the desired parameters θ? We present a generative model of how a given annotator, knowing the true θ, stochastically chooses rationales. Thus, observing the rationales helps us infer t...
متن کاملMinimizing the Costs in Generalized Interactive Annotation Learning
Supervised learning involves collecting unlabeled data, defining features to represent an instance, obtaining annotations for the unlabeled instances, and learning a classifier from the annotated data. Each of these steps has an associated cost. In this thesis, our goal is to reduce the total cost for the desired performance in supervised learning. Specifically, we focus on reducing the cost of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008